Ranking bias in deep web size estimation using capture recapture method
نویسنده
چکیده
Many deep web data sources are ranked data sources, i.e., they rank the matched documents and return at most the top k number of results even though there are more than k documents matching the query. While estimating the size of such ranked deep web data source, it is well known that there is a ranking bias– the traditional methods tend to underestimate the size when queries overflow ( match more documents than the return limit). Numerous estimation methods have been proposed to overcome the ranking bias, such as by avoiding overflowing queries during the sampling process, or by adjusting the initial estimation using a fixed function. We observe that the overflow rate has a direct impact on the accuracy of the estimation. Under certain conditions, the actual size is close to the estimation obtained by unranked model multiplied by the overflow rate. Based on this result, this paper proposes a method that allows overflowing queries in the sampling process.
منابع مشابه
A comparison of linear transect and capture recapture methods results in Iranian Jerboa population density and abundance estimation in Mirabad plains, Shahreza
During a period from spring 2008 till fall 2010, Iranian Jerboa population abundance was estimated using distance (linear transect) and capture-recapture methods in the Mirabad plains near Shahreza city in Isfahan Province. In the study period, during the active time of the species except reproduction time, we tried to live-trap, mark, release and recapture individuals based on Schnabel method ...
متن کاملEstimation of Maternal Mortality Rate in Iran from 2010 to 2014 Using Capture-Recapture Method
Estimation of Maternal Mortality Rate in Iran from 2010 to 2014 Using Capture-Recapture Method Ayat Ahmadi 1, Bahareh Yazdizadeh 2, Alireza Zemestani 3* 1Assistant professor of Epidemiology, Knowledge Utilization Research Center, Tehran University of Medical Sciences, Tehran, Iran 2Associate professor of Epidemiology, Knowledge Utilization Research Center, Tehran University of Medical Science...
متن کاملEstimation of Number of Addicts to Drug Abuse Addicts by Using Capture-Recapture Method in Ilam City
Introduction: Addiction is one of the most important psychosocial injuries in our country. Estimation of number of addicts in communities have always been a difficult and and controversies. The aim of the present was to estimation of number of addicts to drug abuse addicts by using capture-recapture method in Ilam city. Materials & methods: Data were collected by using questionnaire based on ...
متن کاملEstimation of Road Traffic Mortality in Kurdistan Province, Iran, During 2004-2009, Using Capture-Recapture Method
Background: To reduce traffic injuries in the country, health professionals should have accurate estimates of road traffic deaths. Multiple and sometimes inconsistent statistics presented by organizations in charge create high degree of uncertainty for planners and decision makers. To achieve an accurate estimate, several methods are available. Of them, capture-recapture method ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Knowl. Eng.
دوره 69 شماره
صفحات -
تاریخ انتشار 2010